A prototype R package for modeling
tracking and event data in soccer
Other reasons…
data.table, vctrs, rlang
flowchart LR
A[DB]:::external --> B[socr]
B:::socr --> C("as_event() \n DT")
B:::socr --> D("as_tracking() \n DT")
C:::function --> E("as_action() \n rcrd")
D:::function --> F("as_position() \n rcrd")
E:::function --> G("metrics(a, p)")
F:::function --> G("metrics(a, p)")
G:::metrics --> H("models(trk, evt)"):::models
classDef external fill:#eee,stroke:#36454F,font-family:Fira Sans;
classDef socr font-family:Fira Sans;
classDef function fill:#f96,font-family:Fira Code,font-size:0.85em;
classDef metrics fill:#48C9B0,font-family:Fira Code,font-size:0.85em;
classDef models fill:#48A1C9,font-family:Fira Code,font-size:0.85em;
\(~\)
Coercion
Metrics
Models
period time entity x y team
1 1 0.04 player11 0.0984 38.5904 home
2 1 0.08 player11 0.1152 38.5904 home
3 1 0.12 player11 0.1368 38.5904 home
4 1 0.16 player11 0.1452 38.5904 home
5 1 0.20 player11 0.1548 38.5904 home
6 1 0.24 player11 0.1680 38.5904 home
Key: <entity, team, time>
period team entity time x y
<int> <char> <char> <num> <num> <num>
1: 1 ball ball 0.04 54.5664 30.9672
2: 1 ball ball 0.08 59.5740 32.5248
3: 1 ball ball 0.12 64.4592 34.0448
4: 1 ball ball 0.16 66.4152 33.7848
5: 1 ball ball 0.20 66.6144 32.4560
---
1639161: 1 home player9 2850.56 58.3908 84.0000
1639162: 1 home player9 2850.60 58.3908 84.0000
1639163: 1 home player9 2850.64 58.3908 84.0000
1639164: 1 home player9 2850.68 58.3908 84.0000
1639165: 1 home player9 2850.72 58.3908 84.0000
Key: <entity, team, time>
period team entity time x y
<int> <char> <char> <num> <num> <num>
1: 1 ball ball 0.04 54.5664 30.9672
2: 1 ball ball 0.08 59.5740 32.5248
3: 1 ball ball 0.12 64.4592 34.0448
4: 1 ball ball 0.16 66.4152 33.7848
5: 1 ball ball 0.20 66.6144 32.4560
---
1639161: 1 home player9 2850.56 58.3908 84.0000
1639162: 1 home player9 2850.60 58.3908 84.0000
1639163: 1 home player9 2850.64 58.3908 84.0000
1639164: 1 home player9 2850.68 58.3908 84.0000
1639165: 1 home player9 2850.72 58.3908 84.0000
\(~\)
# socr:::validate_tracking()
validate_tracking <- function(x) {
vec_assert(field(x, "period"), integer())
vec_assert(field(x, "entity"), character())
vec_assert(field(x, "team"), character())
vec_assert(field(x, "time"), double())
vec_assert(field(x, "x"), double())
vec_assert(field(x, "y"), double())
}# socr:::validate_tracking()
validate_tracking <- function(x) {
vec_assert(field(x, "period"), integer())
vec_assert(field(x, "entity"), character())
vec_assert(field(x, "team"), character())
vec_assert(field(x, "time"), double())
vec_assert(field(x, "x"), double())
vec_assert(field(x, "y"), double())
if (!identical(key(x), c("entity", "team", "time"))) {
setkeyv(x, cols = c("entity", "team", "time"))
}
x
}A position rcrd is a minimal representation of tracking data. It has five fields which must be vectors of the same length1.
A position rcrd is a minimal representation of tracking data. It has five fields which must be vectors of the same length.
A position rcrd is a minimal representation of tracking data. It has five fields which must be vectors of the same length.
\(~\)
\(~\)
<position[1639165]>
t: 0.04 e: ball <ball> xy: 54.6, 31.0
t: 0.08 e: ball <ball> xy: 59.6, 32.5
t: 0.12 e: ball <ball> xy: 64.5, 34.0
t: 0.16 e: ball <ball> xy: 66.4, 33.8
t: 0.20 e: ball <ball> xy: 66.6, 32.5
t: 0.24 e: ball <ball> xy: 66.8, 31.1
t: 0.28 e: ball <ball> xy: 67.0, 29.8
t: 0.32 e: ball <ball> xy: 67.2, 28.5
t: 0.36 e: ball <ball> xy: 67.4, 27.1
t: 0.40 e: ball <ball> xy: 67.6, 25.8
... with 1639155 more positions
Optionally, construct a position from scratch.
p_constructed <-
position(
entity = "Kerr", # vector recycling
team = "Chelsea",
time = 10:13,
x = c(20, 21, 23, 26),
y = c(50, 48, 47, 45)
)
print(p_constructed)<position[4]>
t: 10.00 e: Kerr <Chelsea> xy: 20.0, 50.0
t: 11.00 e: Kerr <Chelsea> xy: 21.0, 48.0
t: 12.00 e: Kerr <Chelsea> xy: 23.0, 47.0
t: 13.00 e: Kerr <Chelsea> xy: 26.0, 45.0
<position[4]>
t: 10.00 e: Kerr <Chelsea> xy: 20.0, 50.0
t: 11.00 e: Kerr <Chelsea> xy: 21.0, 48.0
t: 12.00 e: Kerr <Chelsea> xy: 23.0, 47.0
t: 13.00 e: Kerr <Chelsea> xy: 26.0, 45.0
All metrics are calculated from position or action objects
| Dimensions | Representation | Examples |
|---|---|---|
| 1 | vector |
|
| 2 | matrix |
|
| ≥ 3 | array |
|
By default, the displacement metric is calculated within-entity
<position[1639165]>
t: 0.04 e: ball <ball> xy: 54.6, 31.0
t: 0.08 e: ball <ball> xy: 59.6, 32.5
t: 0.12 e: ball <ball> xy: 64.5, 34.0
t: 0.16 e: ball <ball> xy: 66.4, 33.8
t: 0.20 e: ball <ball> xy: 66.6, 32.5
... with 1639160 more positions
By providing from and/or to, displacement is calculated between-entity
By providing from and/or to, displacement is calculated between-entity
x y xy
1 26.0976 7.9640 27.28571
2 24.2400 8.7288 25.76373
3 20.9676 6.8544 22.05953
4 18.7344 4.5152 19.27083
Providing multiple values in from and to returns an array with dimensions from \(\times\) to \(\times\) time \(\times\) 3 (x, y, z)
home_players <- vec_unique(get_entity(p, team = "home"))
d <- displacement(p, from = home_players, to = home_players)
round(d[, , 1, "x"], 2) player1 player10 player11 player2 player3 player4 player5 player6
player1 0.00 -26.84 38.81 -1.06 1.08 -0.02 -10.73 -10.37
player10 26.84 0.00 65.64 25.78 27.92 26.82 16.11 16.47
player11 -38.81 -65.64 0.00 -39.87 -37.72 -38.83 -49.53 -49.17
player2 1.06 -25.78 39.87 0.00 2.15 1.04 -9.66 -9.30
player3 -1.08 -27.92 37.72 -2.15 0.00 -1.11 -11.81 -11.45
player4 0.02 -26.82 38.83 -1.04 1.11 0.00 -10.70 -10.34
player5 10.73 -16.11 49.53 9.66 11.81 10.70 0.00 0.36
player6 10.37 -16.47 49.17 9.30 11.45 10.34 -0.36 0.00
player7 8.85 -17.99 47.66 7.79 9.94 8.83 -1.87 -1.51
player8 14.90 -11.94 53.70 13.84 15.98 14.88 4.17 4.53
player9 24.41 -2.43 63.22 23.35 25.50 24.39 13.69 14.05
player7 player8 player9
player1 -8.85 -14.90 -24.41
player10 17.99 11.94 2.43
player11 -47.66 -53.70 -63.22
player2 -7.79 -13.84 -23.35
player3 -9.94 -15.98 -25.50
player4 -8.83 -14.88 -24.39
player5 1.87 -4.17 -13.69
player6 1.51 -4.53 -14.05
player7 0.00 -6.05 -15.56
player8 6.05 0.00 -9.51
player9 15.56 9.51 0.00
When desired, metric functions can operate on position or action objects, handled via dispatch
tracking and/or event objectsposition and action objects internally, with object validationposition and action objectsdata.frame / data.table outputpitch_control <- function(tracking, grid = c(120, 80), cells = 200) {
validate_tracking(tracking)
players <- as_position(tracking, ball = FALSE, expand = TRUE)
# metrics
lo <- get_location(players)
mu <- expected_position(players)
th <- theta(players)
sr <- speed_ratio(players)
ir <- influence_radius(as_position(tracking, expand = TRUE))
}pitch_control <- function(tracking, grid = c(120, 80), cells = 200) {
...
# rotation and scaling matrices
rotation <- lapply(
th,
function(x) matrix(c(cos(x), sin(x), -sin(x), cos(x)), nrow = 2)
)
scaling <- purrr::pmap(
list(as.numeric((1 + sr) * ir / 2),
as.numeric((1 - sr) * ir / 2)),
function(x, y) matrix(c(x, 0, 0, y), nrow = 2)
)
# sigma
sigma <- purrr::pmap(
list(rotation, scaling),
function(x, y) x %*% y %*% y %*% solve(x)
)
}pitch_control <- function(tracking, grid = c(120, 80), cells = 200) {
...
# influence by player for each pitch cell
pitch_influence <- purrr::pmap(
list(mu, sigma),
function(x, y) mvtnorm::dmvnorm(pitch, x, y)
)
# influence of player at location
location_influence <- purrr::pmap_dbl(
list(lo, mu, sigma),
function(x, y, z) mvtnorm::dmvnorm(x, y, z)
)
influence <- do.call(cbind, pitch_influence) %*%
Matrix::Diagonal(x = 1 / location_influence)
}pitch_control <- function(tracking, grid = c(120, 80), cells = 200) {
...
# multiply by -1 for summation convenience
influence[, away_team] <- influence[, away_team] * -1
summation <- rowsum(x = t(as.matrix(influence)),
group = get_time(players),
na.rm = TRUE)
pc <- 1 / (1 + exp(summation))
# return as data.table
}trk_short <- trk[time %in% seq(834, 857, by = 0.2)]
bench::mark(
pc <- pitch_control(trk_short, grid = c(120, 80), cells = 200),
max_iterations = 1
)$median[1] 7.92s
time x y control
<num> <num> <num> <num>
1: 834.2 0 0.0000000 0.5
2: 834.2 0 0.4020101 0.5
3: 834.2 0 0.8040201 0.5
4: 834.2 0 1.2060302 0.5
5: 834.2 0 1.6080402 0.5
---
4599996: 857.0 120 78.3919598 0.5
4599997: 857.0 120 78.7939698 0.5
4599998: 857.0 120 79.1959799 0.5
4599999: 857.0 120 79.5979899 0.5
4600000: 857.0 120 80.0000000 0.5
Implementation of Fernández and Bornn (2018), Wide Open Spaces, MIT SSAC
Currently implementing formation classification model from Shaw and Glickman (2019), Barca Sports Analytics Summit
Requires additional metrics, e.g. possession which uses action object
No plans to release
Scaling models like pitch_control() to multiple matches
Finding a good minimal representation of event data for as_action()
Managing dependencies for machine learning (tidymodels, tensorflow, torch)
Thanks!
George Wood